Skip to content

[AutoSparkUT] Fix GpuScalar to preserve -0.0 for float/double (issue #14116)#14400

Open
wjxiz1992 wants to merge 2 commits intoNVIDIA:mainfrom
wjxiz1992:fix/14116-normalize-float-subquery
Open

[AutoSparkUT] Fix GpuScalar to preserve -0.0 for float/double (issue #14116)#14400
wjxiz1992 wants to merge 2 commits intoNVIDIA:mainfrom
wjxiz1992:fix/14116-normalize-float-subquery

Conversation

@wjxiz1992
Copy link
Collaborator

Summary

  • Root cause: cuDF Scalar.fromDouble(-0.0) normalizes -0.0 to 0.0, losing the IEEE 754 sign bit. This caused scalar subquery results to return 0.0 where CPU returns -0.0, violating GPU-CPU parity.
  • Fix: In GpuScalar.from(), create float/double scalars via a 1-element ColumnVector + getScalarElement(0) instead of Scalar.fromDouble/Scalar.fromFloat. The column-based path preserves the exact bit pattern.
  • Recovered test: Removes the .exclude() for "normalize special floating numbers in subquery" in RapidsSQLQuerySuite.

Detail

The GpuScalarSubquery path extracts a JVM Double from the inner query result, then creates a GpuScalar via Scalar.fromDouble(value). cuDF's JNI layer (makeFloat64Scalar) normalizes -0.0 to 0.0 during scalar creation. This was confirmed by direct cuDF API testing:

cudf_scalar=0                    <- Scalar.fromDouble(-0.0) bits=0 (0.0)
cudf_col=0                       <- ColumnVector from that scalar also 0.0
plain_v1=8000000000000000        <- Direct SELECT preserves -0.0
subq_no_join=0                   <- Scalar subquery loses -0.0
join_no_subq=8000000000000000    <- Join without subquery preserves -0.0

The fix replaces Scalar.fromDouble(d) with ColumnVector.fromDoubles(d).getScalarElement(0), which preserves the bit pattern through the host-to-device column creation path.

RAPIDS test to Spark original mapping

RAPIDS test Spark original Spark file Lines
normalize special floating numbers in subquery (inherited) normalize special floating numbers in subquery sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 3620-3636

Test plan

  • mvn package -pl tests -am -Dbuildver=330 -DwildcardSuites=RapidsSQLQuerySuite -- succeeded 216, failed 0, ignored 17 (previously: succeeded 215, ignored 18)
  • The previously excluded test "normalize special floating numbers in subquery" now passes
  • No new test failures introduced

Closes #14116

Made with Cursor

…VIDIA#14116)

cuDF's Scalar.fromDouble(-0.0) normalizes -0.0 to 0.0, losing the sign
bit. This caused GPU scalar subqueries to return 0.0 where CPU correctly
returns -0.0, violating GPU-CPU parity.

Root cause: the JNI path Scalar.fromDouble -> makeFloat64Scalar drops
the IEEE 754 sign bit of negative zero during scalar creation.

Fix: in GpuScalar.from(), create float/double scalars via a 1-element
ColumnVector + getScalarElement(0) instead of Scalar.fromDouble/fromFloat.
The column-based path preserves the exact bit pattern.

This re-enables the previously excluded test "normalize special floating
numbers in subquery" in RapidsSQLQuerySuite.

Closes NVIDIA#14116

Signed-off-by: Allen Xu <allxu@nvidia.com>
Made-with: Cursor
Copilot AI review requested due to automatic review settings March 11, 2026 09:18
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR fixes a GPU-CPU parity bug where GpuScalar.from() would lose the IEEE 754 sign bit of -0.0 for Float and Double types, because cuDF's Scalar.fromDouble/Scalar.fromFloat normalizes -0.0 to 0.0 at the JNI level.

Key changes:

  • In GpuScalar.from(), the DoubleType and FloatType branches now detect -0.0 via doubleToRawLongBits/floatToRawIntBits and, when the value is negative zero, create the scalar via a 1-element ColumnVector + getScalarElement(0) instead of Scalar.fromDouble/Scalar.fromFloat. All other values continue to use the cheaper direct scalar path.
  • The "normalize special floating numbers in subquery" exclusion is removed from RapidsTestSettings, allowing the previously skipped test to run and verify the fix.
  • Resource management in the new code paths is correct — withResource ensures the temporary ColumnVector is closed before the Scalar is returned.

Confidence Score: 4/5

  • This PR is safe to merge; the fix is narrowly scoped, logically correct, and validated by a recovered test.
  • The -0.0 detection logic using raw bit comparison is correct, the withResource usage prevents resource leaks, and the change is confined to a well-understood special case. Score is 4 rather than 5 only because of minor code duplication across the three case branches and the fact that analogous NaN-payload normalization issues (if they exist in cuDF) would not be covered by this fix.
  • No files require special attention; both changed files are straightforward.

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala Adds a -0.0 special-case for Double and Float scalar creation: when the value is negative zero, a 1-element ColumnVector is used instead of Scalar.fromDouble/Scalar.fromFloat to preserve the IEEE 754 sign bit. Logic and resource management are correct; minor code duplication across three case branches.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala Removes the .exclude("normalize special floating numbers in subquery", ...) entry, re-enabling the previously skipped test now that the underlying -0.0 scalar bug is fixed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["GpuScalar.from(v, dataType)"] --> B{dataType}
    B --> |DoubleType| C{v type?}
    B --> |FloatType| G{v type?}

    C --> |"Double d"| D{"is negative zero?"}
    C --> |"Float f"| E["d = f.toDouble"] --> D

    D --> |yes| F["ColumnVector.fromDoubles(d) then getScalarElement(0) - preserves sign bit"]
    D --> |no| N["Scalar.fromDouble(d) - fast path"]

    G --> |"Float f"| H{"is negative zero?"}
    H --> |yes| I["ColumnVector.fromFloats(f) then getScalarElement(0) - preserves sign bit"]
    H --> |no| J["Scalar.fromFloat(f) - fast path"]

    F --> K[Return Scalar]
    N --> K
    I --> K
    J --> K
Loading

Comments Outside Diff (2)

  1. sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala, line 282-344 (link)

    Code duplication and simplified -0.0 detection

    The fix has duplicated withResource(ColumnVector.fromDoubles(d)) { cv => cv.getScalarElement(0) } blocks in three separate places (double/double, double/float, float/float). Additionally, the condition JDouble.doubleToRawLongBits(d) == JDouble.doubleToRawLongBits(-0.0d) re-computes doubleToRawLongBits(-0.0d) at each call site instead of using a constant. The check can be simplified to JDouble.doubleToRawLongBits(d) == 0x8000000000000000L (since -0.0d always has that bit pattern), and the duplicate blocks can be extracted into a helper. This is a style concern, not a correctness issue.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala, line 292-300 (link)

    Float-to-Double -0.0 can bypass the column-vector path

    When a Float is provided for DoubleType, the code does val d = f.toDouble and then checks doubleToRawLongBits(d) == doubleToRawLongBits(-0.0d). IEEE 754 mandates that -0.0f.toDouble == -0.0d, so the check works correctly today. However, if cuDF ever also normalizes other Float-specific NaN payloads during float→double widening through ColumnVector, this path would still not catch them. This is not a bug for the current fix, but worth noting as a future-proofing consideration.

Last reviewed commit: dbc46f0

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes GPU/CPU parity for scalar subquery results involving signed zero by ensuring GpuScalar.from preserves the IEEE-754 -0.0 bit pattern for float/double values, and re-enables the previously excluded Spark-derived test that covers this case.

Changes:

  • Update GpuScalar.from to create float/double cuDF scalars via a 1-element ColumnVector + getScalarElement(0) to preserve -0.0.
  • Remove the exclusion for "normalize special floating numbers in subquery" from Spark 3.3.0 test settings.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala Changes float/double scalar creation path to preserve -0.0 bit pattern.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala Re-enables the previously excluded subquery normalization test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 283 to 294
case d: Double =>
// cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116).
// Create via a 1-element column to preserve the exact bit pattern.
withResource(ColumnVector.fromDoubles(d)) { cv =>
cv.getScalarElement(0)
}
case f: Float =>
withResource(ColumnVector.fromDoubles(f.toDouble)) { cv =>
cv.getScalarElement(0)
}
case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" +
s" for DoubleType, expecting Double or Float.")
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating float/double scalars by materializing a 1-element ColumnVector will allocate/copy device memory on every scalar creation, which can be a noticeable perf/regression vs Scalar.fromDouble/fromFloat (GpuScalar.from is used in many code paths). Consider using the ColumnVector+getScalarElement workaround only for the specific problematic value (signed zero), and keep the fast Scalar.fromDouble path for all other doubles/floats.

Suggested change
case d: Double =>
// cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116).
// Create via a 1-element column to preserve the exact bit pattern.
withResource(ColumnVector.fromDoubles(d)) { cv =>
cv.getScalarElement(0)
}
case f: Float =>
withResource(ColumnVector.fromDoubles(f.toDouble)) { cv =>
cv.getScalarElement(0)
}
case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" +
s" for DoubleType, expecting Double or Float.")
case d: Double =>
// cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116).
// Use a 1-element column only for negative zero to preserve the exact bit pattern.
if (d == 0.0 && JDouble.doubleToRawLongBits(d) == JDouble.doubleToRawLongBits(-0.0d)) {
withResource(ColumnVector.fromDoubles(d)) { cv =>
cv.getScalarElement(0)
}
} else {
Scalar.fromDouble(d)
}
case f: Float =>
// Preserve negative zero for floats as well.
if (f == 0.0f && JFloat.floatToRawIntBits(f) == JFloat.floatToRawIntBits(-0.0f)) {
withResource(ColumnVector.fromDoubles(f.toDouble)) { cv =>
cv.getScalarElement(0)
}
} else {
Scalar.fromDouble(f.toDouble)
}
case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" +
s" for DoubleType, expecting Double or Float.")

Copilot uses AI. Check for mistakes.
Comment on lines +327 to +329
// Create via a 1-element column to preserve the exact bit pattern.
withResource(ColumnVector.fromFloats(f)) { cv =>
cv.getScalarElement(0)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same perf concern as the DoubleType branch: using ColumnVector.fromFloats(...).getScalarElement(0) for every float scalar creation adds extra allocations/copies. If the only correctness issue is preserving -0.0f, consider detecting signed zero (via raw bits) and only using the column-based workaround in that case, otherwise use Scalar.fromFloat.

Suggested change
// Create via a 1-element column to preserve the exact bit pattern.
withResource(ColumnVector.fromFloats(f)) { cv =>
cv.getScalarElement(0)
// For -0.0f, create via a 1-element column to preserve the exact bit pattern.
if (JFloat.floatToRawIntBits(f) == JFloat.floatToRawIntBits(-0.0f)) {
withResource(ColumnVector.fromFloats(f)) { cv =>
cv.getScalarElement(0)
}
} else {
Scalar.fromFloat(f)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I get that Spark has a test for this, but what value is there is keeping -0.0 instead of normalizing it to 0.0? IEEE treats them all as the same, and spark has had bugs where -0.0 can cause issues. Does creating a column so that we can pull out a scalar from it cost any performance? It will cost some memory at least. Do we know where cudf is normalizing this? I just want questions answered before we put in a change like this.

Address review feedback: instead of routing all float/double scalar
creation through ColumnVector (which allocates device memory), detect
-0.0 via raw bit comparison and only use the slow path for that specific
value. All other values continue to use the fast Scalar.fromDouble/
Scalar.fromFloat path, making the common-case cost zero.

Signed-off-by: Allen Xu <allxu@nvidia.com>
Made-with: Cursor
Signed-off-by: Allen Xu <allxu@nvidia.com>
@wjxiz1992
Copy link
Collaborator Author

wjxiz1992 commented Mar 12, 2026

@revans2 Let me add more background on the UT fix work:

  1. we want to increase test coverage

  2. method is: port(by extends) Spark Unit Tests directly via rapidsTest framework to run them on GPU.

  3. We want GPU to produce exact same behavior as CPU does. For this case, though we know 0.0 == -0.0, but if CPU produces -0.0, GPU should also produce -0.0 . Another check in Spark is this compare, it intentionally distinguishes them.

  4. sign bit loss at Java_ai_rapids_cudf_Scalar_makeFloat64Scalar.

The debug code used in the suite was:

testRapids("DEBUG #14116 - trace negative zero through join") {
  val negZeroRow = java.util.Arrays.asList(Row(-0.0))
  val posZeroRow = java.util.Arrays.asList(Row(0.0))
  val schema = org.apache.spark.sql.types.StructType(Seq(
    org.apache.spark.sql.types.StructField(
      "d", org.apache.spark.sql.types.DoubleType)))
  withTempView("v1", "v2") {
    spark.createDataFrame(negZeroRow, schema)
      .createTempView("v1")
    spark.createDataFrame(posZeroRow, schema)
      .createTempView("v2")

    def bits(d: Double): String =
      java.lang.Double.doubleToRawLongBits(d).toHexString

    val sb = new StringBuilder
    import com.nvidia.spark.rapids.{GpuScalar, Arm}
    Arm.withResource(
      GpuScalar((-0.0).asInstanceOf[Any],
        org.apache.spark.sql.types.DoubleType)
    ) { gs =>
      val s = gs.getBase
      sb.append(s"gpuscalar=${bits(s.getDouble)} ")
    }

    val tests = Seq(
      "plain_v1" -> "SELECT d FROM v1",
      "subq_no_join" -> "SELECT (SELECT d FROM v1)",
      "join_no_subq" ->
        "SELECT v1.d FROM v1 JOIN v2 ON v1.d = v2.d",
      "subq_with_join" -> ("SELECT (SELECT v1.d " +
        "FROM v1 JOIN v2 ON v1.d = v2.d)")
    )
    tests.foreach { case (name, query) =>
      val d = sql(query).collect().head.getDouble(0)
      sb.append(s"$name=${bits(d)} ")
    }

    val msg = sb.toString
    assert(msg.contains("gpuscalar=8000000000000000"),
      s"GpuScalar lost -0.0. All results: $msg")
    assert(msg.contains("subq_no_join=8000000000000000"),
      s"subq_no_join lost -0.0. All results: $msg")
    assert(msg.contains("subq_with_join=8000000000000000"),
      s"subq_with_join lost -0.0. All results: $msg")
  }
}

Test output was:

cudf_scalar=0 cudf_col=0 plain_v1=8000000000000000 subq_no_join=0 join_no_subq=8000000000000000 subq_with_join=0
Test Bits Meaning
cudf_scalar 0 Scalar.fromDouble(-0.0) → 0.0, sign bit lost
plain_v1 8000000000000000 Direct SELECT via column path, -0.0 preserved
subq_no_join 0 Scalar subquery (no join), goes through GpuScalar → sign bit lost
join_no_subq 8000000000000000 Join without scalar subquery, -0.0 preserved
subq_with_join 0 Scalar subquery + join, same GpuScalar path → sign bit lost

==================================

A more dedicated test:

NegativeZeroScalarRepro.java

import ai.rapids.cudf.ColumnVector;
import ai.rapids.cudf.Scalar;

/**
 * Standalone reproducer: cuDF Scalar.fromDouble(-0.0) loses the IEEE 754 sign bit.
 *
 * Run with:
 *   javac -cp <rapids-uber-jar> NegativeZeroScalarRepro.java
 *   java  -cp .:<rapids-uber-jar> NegativeZeroScalarRepro
 *
 * Expected: Scalar path should preserve -0.0 (sign bit = 0x8000000000000000)
 * Actual:   Scalar path normalizes -0.0 to 0.0 (sign bit lost)
 *
 * See: https://github.com/NVIDIA/spark-rapids/issues/14116
 */
public class NegativeZeroScalarRepro {
  public static void main(String[] args) {
    System.out.println("=== cuDF Scalar -0.0 sign bit reproducer ===\n");
    int failures = 0;

    // --- Double tests ---
    System.out.println("--- Double (-0.0d) ---");

    // Path 1: Scalar.fromDouble — this is the buggy path
    try (Scalar s = Scalar.fromDouble(-0.0d)) {
      long bits = Double.doubleToRawLongBits(s.getDouble());
      boolean preserved = bits == 0x8000000000000000L;
      System.out.printf("  Scalar.fromDouble(-0.0):        bits=0x%016x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // Path 2: ColumnVector — this preserves the bit pattern
    try (ColumnVector cv = ColumnVector.fromDoubles(-0.0d);
         Scalar s = cv.getScalarElement(0)) {
      long bits = Double.doubleToRawLongBits(s.getDouble());
      boolean preserved = bits == 0x8000000000000000L;
      System.out.printf("  ColumnVector(-0.0).getScalar:   bits=0x%016x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // --- Float tests ---
    System.out.println("\n--- Float (-0.0f) ---");

    // Path 1: Scalar.fromFloat
    try (Scalar s = Scalar.fromFloat(-0.0f)) {
      int bits = Float.floatToRawIntBits(s.getFloat());
      boolean preserved = bits == 0x80000000;
      System.out.printf("  Scalar.fromFloat(-0.0f):        bits=0x%08x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // Path 2: ColumnVector
    try (ColumnVector cv = ColumnVector.fromFloats(-0.0f);
         Scalar s = cv.getScalarElement(0)) {
      int bits = Float.floatToRawIntBits(s.getFloat());
      boolean preserved = bits == 0x80000000;
      System.out.printf("  ColumnVector(-0.0f).getScalar:  bits=0x%08x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // --- Positive zero sanity check ---
    System.out.println("\n--- Positive zero (sanity check) ---");
    try (Scalar s = Scalar.fromDouble(0.0d)) {
      long bits = Double.doubleToRawLongBits(s.getDouble());
      boolean ok = bits == 0L;
      System.out.printf("  Scalar.fromDouble(0.0):         bits=0x%016x  %s%n",
          bits, ok ? "PASS" : "FAIL");
      if (!ok) failures++;
    }
    try (Scalar s = Scalar.fromFloat(0.0f)) {
      int bits = Float.floatToRawIntBits(s.getFloat());
      boolean ok = bits == 0;
      System.out.printf("  Scalar.fromFloat(0.0f):         bits=0x%08x  %s%n",
          bits, ok ? "PASS" : "FAIL");
      if (!ok) failures++;
    }

    // --- User-visible impact ---
    System.out.println("\n--- User-visible impact (string representation) ---");
    System.out.printf("  Java Double.toString(-0.0): \"%s\"%n", Double.toString(-0.0));
    System.out.printf("  Java Double.toString(0.0):  \"%s\"%n", Double.toString(0.0));
    System.out.println("  => cast(-0.0 as string) would return \"0.0\" on GPU"
        + " instead of \"-0.0\" on CPU");

    System.out.printf("%n=== Result: %d failure(s) ===%n", failures);
    System.exit(failures > 0 ? 1 : 0);
  }
}

Repro:

// first compile
javac -cp <PATH_TO>/rapids-4-spark_2.12-26.04.0-SNAPSHOT-cuda12.jar NegativeZeroScalarRepro.java
...
// run
RAPIDS_JAR=<PATH_TO>/rapids-4-spark_2.12-26.04.0-SNAPSHOT-cuda12.jar && SLF4J_JAR=<PATH_TO>/org/slf4j/slf4j-api/1.7.36/slf4j-api-1.7.36.jar &&  java -cp .:${RAPIDS_JAR}:${SLF4J_JAR} NegativeZeroScalarRepro

==================================================

=== cuDF Scalar -0.0 sign bit reproducer ===

--- Double (-0.0d) ---
  Scalar.fromDouble(-0.0):        bits=0x0000000000000000  FAIL (sign bit lost)
  ColumnVector(-0.0).getScalar:   bits=0x8000000000000000  PASS (sign bit preserved)

--- Float (-0.0f) ---
  Scalar.fromFloat(-0.0f):        bits=0x00000000  FAIL (sign bit lost)
  ColumnVector(-0.0f).getScalar:  bits=0x80000000  PASS (sign bit preserved)

--- Positive zero (sanity check) ---
  Scalar.fromDouble(0.0):         bits=0x0000000000000000  PASS
  Scalar.fromFloat(0.0f):         bits=0x00000000  PASS

--- User-visible impact (string representation) ---
  Java Double.toString(-0.0): "-0.0"
  Java Double.toString(0.0):  "0.0"
  => cast(-0.0 as string) would return "0.0" on GPU instead of "-0.0" on CPU

=== Result: 2 failure(s) ===
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

I didn't file issue to cuDF as this 0.0 issue in this use case is more specific to Spark, but I can file one if you think it's needed.

  1. I update the logic to only ColumnVector workaround for -0.0 for performance. All other values continue to use the fast Scalar.fromDouble/Scalar.fromFloat path.

@wjxiz1992 wjxiz1992 self-assigned this Mar 12, 2026
@revans2
Copy link
Collaborator

revans2 commented Mar 12, 2026

@wjxiz1992 I think you might have misunderstood my comments. -0.0 vs 0.0 is a very minor thing. It feels very inconsequential to have a scalar value be different. I get that we want to match all of spark's unit tests, but if it is going to make the code less maintainable and possibly slower, then we need to think about the cost of being 100% compatible with spark and decide if it is worth it. We also need to think about the change that we are making and ask ourselves if this is the right place to make the change. Here we are making multiple tiny memory allocations and copies to make it so that we can preserve the -0.0. That is not worth it to me, which is why I asked where the error came from. We are not explicitly normalizing -0.0 to 0.0 in the code you pointed to, so I want to understand if there is something we can do there that would let us fix this in a much more efficient way. I also want to understand who added the test and why that added the test. What possible case exists that we need to preserve this, especially when spark strips it out in so many situations.

For me I am +1 if we can do this in a way that does not cause any more GPU memory allocations or GPU data movement, or make the code more difficult to work with. I am -1 in all other cases unless we can prove that this is critical to some real world situation.

@wjxiz1992
Copy link
Collaborator Author

wjxiz1992 commented Mar 13, 2026

@revans2 I see, thanks for the nice explanation!

  1. Now I changed my wrong thought that "correctness(consistency with Spark) > performance". it really depends.
  2. Yes, the JNI code deson't explicitly normalize it (sign bit loss is just a result). makeFloat64Scalar creates a cudf::scalar_type_t and calls set_value(static_cast(value)). No explicit normalization here — the jdouble → double cast preserves the sign bit. Then cuDF C++ layer — cudf/cpp/src/scalar/scalar.cpp
    fixed_width_scalar::set_value
    calls _data.set_value_async(value, stream) where _data is an rmm::device_scalar. This performs a host-to-device copy. The sign bit is lost somewhere in this RMM path — neither the JNI nor the cuDF C++ code explicitly normalizes -0.0.
  3. Let me file a cuDF issue there for more investigations and discussions, this tiny unit test doesn't deserve the added GPU memory costs introduced in this PR.
  4. All [AutpSparkUT] prefix PRs actually originates from the test coverage worry from high table, then Mahone introduced this "RapidsTest-extends" way to migrate all Spark unit test. And no, it's not from any actual customer use case, you can regard it as just a KPI "how many Spark unit tests have been migrated". All problematic tests are now excluded in this file, and my job is to remove them from exlucsion. I'll be more cautious so it doens't ruin the spark-rapids project.

@wjxiz1992
Copy link
Collaborator Author

@revans2 Update: I traced the root cause down to the RMM layer.

Root cause: rmm::device_uvector::set_element_async (device_uvector.hpp L222-226) has a zero-optimization that uses cudaMemsetAsync when value == value_type{0}. Since IEEE 754 defines -0.0 == 0.0 as true, -0.0 triggers the memset path, which clears all bits including the sign bit.

cudaMemcpy(-0.0):   bits=0x8000000000000000  PASS (sign bit preserved)
cudaMemset(0):      bits=0x0000000000000000  FAIL (sign bit lost)
-0.0 == 0.0 ? TRUE (IEEE 754)

The call chain is: Scalar.fromDouble → JNI makeFloat64Scalarcudf::fixed_width_scalar::set_valuermm::device_scalar::set_value_asyncdevice_uvector::set_element_async → hits the == 0 optimization → cudaMemsetAsync → sign bit gone.

Filed as rapidsai/rmm#2298 with a CUDA reproducer and suggested fix (use memcmp instead of == for floating-point zero detection).

For this PR: since the proper fix belongs in RMM (zero GPU memory overhead once fixed there), I'll keep the exclude in place and revisit after the RMM fix lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AutoSparkUT]"normalize special floating numbers in subquery" in SQLQuerySuite failed

3 participants